[Fix] Restore missing commas in extract_characters_regex prefix list by YoungZSh · Pull Request #1560 · open-compass/VLMEvalKit

YoungZSh · 2026-05-30T09:08:09Z

Summary

Two pairs of adjacent string literals in the answer_prefixes list inside
extract_characters_regex (vlmeval/dataset/utils/multiple_choice.py) are
missing commas, so Python silently concatenates them at parse time. Four
prefixes that were intended to be stripped before regex matching are
therefore never stripped:

'The best option is' + 'The correct option is'
→ 'The best option isThe correct option is'
'Best answer:' + 'Best option:'
→ 'Best answer:Best option:'

The two Best ... prefixes are the harmful pair, because the B in Best
is itself an option letter: a model that responds with e.g.
"Best answer: D" will have the leading B matched by
re.search(r'[ABCDE]', s) and be scored as B instead of D.

Fix

Add the two missing commas so each of the four prefixes is its own list
element and gets stripped as intended. No behaviour change for predictions
that did not already contain "Best answer:" / "Best option:" prefixes.

Note on CI regression scores

Since this fix corrects scoring behaviour, the pr_run_test regression
scores could shift marginally if any of the test models happens to emit a
"Best answer:" / "Best option:" prefix on the mini sets. Such a shift
would be the fix working as intended (those answers were previously
mis-scored as B), not a regression.

Two pairs of adjacent string literals in the answer_prefixes list inside extract_characters_regex are missing commas, so Python silently concatenates them at parse time. Four prefixes that were intended to be stripped before regex matching are therefore never stripped: - 'The best option is' + 'The correct option is' -> 'The best option isThe correct option is' - 'Best answer:' + 'Best option:' -> 'Best answer:Best option:' The two 'Best ...' prefixes are the harmful pair: the 'B' in 'Best' is itself an option letter, so a model that responds with e.g. 'Best answer: D' has the leading 'B' matched by re.search(r'[ABCDE]', s) and is scored as 'B' instead of 'D'. Add the two missing commas so each of the four prefixes is its own list element and gets stripped as intended.

YoungZSh · 2026-06-12T12:10:10Z

Hi @mzr1996 @kennymckormick, gentle ping on this one — it's a 2-line fix for a scoring correctness bug: responses like "Best answer: D" currently get scored as B, because two missing commas merge the Best answer: / Best option: prefixes in extract_characters_regex so they are never stripped. Would appreciate a quick look (and a CI approval run) when you have a chance. Thanks!

YoungZSh and others added 2 commits May 30, 2026 17:07

Merge branch 'main' into fix/extract-characters-regex-missing-commas

ca8210b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Restore missing commas in extract_characters_regex prefix list#1560

[Fix] Restore missing commas in extract_characters_regex prefix list#1560
YoungZSh wants to merge 2 commits into
open-compass:mainfrom
YoungZSh:fix/extract-characters-regex-missing-commas

YoungZSh commented May 30, 2026 •

edited

Loading

Uh oh!

YoungZSh commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YoungZSh commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Note on CI regression scores

Uh oh!

YoungZSh commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YoungZSh commented May 30, 2026 •

edited

Loading